Optimal Parallel Algorithms for Computing the Sum, the Prefix-Sums, and the Summed Area Table on the Memory Machine Models

نویسنده

Koji Nakano

چکیده

The main contribution of this paper is to show optimal parallel algorithms to compute the sum, the prefix-sums, and the summed area table on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, and the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in O( n w + nl p + l log n) time units on the DMM and the UMM. We then go on to show that Ω(n w + nl p + l log n) time units are necessary to compute the sum. We also present a parallel algorithm that computes the prefix-sums of n numbers in O( n w + nl p + l log n) time units on the DMM and the UMM. Finally, we show that the summed area table of size √ n × √n can be computed in O( n w + nl p + l log n) time units on the DMM and the UMM. Since the computation of the prefix-sums and the summed area table is at least as hard as the sum computation, these parallel algorithms are also optimal. key words: Memory machine models, prefix-sums computation, parallel algorithm, GPU, CUDA

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

متن کامل

Program-Centric Cost Models for Locality and Parallelism

Good locality is critical for the scalability of parallel computations. Many cost models that quantify locality and parallelism of a computation with respect to specific machine models have been proposed. A significant drawback of these machinecentric cost models is their lack of portability. Since the design and analysis of good algorithms in most machine-centric cost models is a non-trivial t...

متن کامل

Foundational Algorithms for Distributed Robot Swarms

In this paper, we study discrete swarm algorithms, where mobile robots (or “mobots”) move around interacting in an environment to solve computational problems. This work extends recent work on swarm algorithms in the distributed computing, artificial intelligence, and robotics literatures in that we allow for mobots to have additional memory so as to enable computations that are somewhat more s...

متن کامل

Paper Title

Prefix sums are an important parallel primitive, especially in massively-parallel programs. This paper discusses two orthogonal generalizations thereof, which we call higher-order and tuple-based prefix sums. Moreover, it describes and evaluates SAM, a GPU-friendly algorithm for computing prefix sums and other scans that directly supports higher orders and tuple values. Its templated CUDA imple...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEICE Transactions

دوره 96-D شماره

صفحات -

تاریخ انتشار 2013

Optimal Parallel Algorithms for Computing the Sum, the Prefix-Sums, and the Summed Area Table on the Memory Machine Models

نویسنده

چکیده

منابع مشابه

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

Program-Centric Cost Models for Locality and Parallelism

Foundational Algorithms for Distributed Robot Swarms

Paper Title

عنوان ژورنال:

اشتراک گذاری